Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Algorithms for Binary Neural Networks

Number of clustering centers

MCNs without center loss

MCNs with center loss

FIGURE 3.4

Accuracy with diﬀerent numbers of clustering centers for 20-layer MCNs with width 16-16-

32-64.

with a batch size of 128. Using diﬀerent values of θ, the performance of MCNs is shown in

Fig. 3.7. First, only the eﬀect of θ is evaluated. Then the center loss is implemented based

on a ﬁne-tuning process. Performance is observed to be stable with variations θ and λ.

The number of clustering centers: We show the quantization with U = 2, 3, 4 denoting

the numbers of clustering centers. In this experiment, we investigate the eﬀect of varying

the number of clustering centers in MCNs based on CIFAR-10.

The results are shown in Fig. 3.4, where accuracy increases with more clustering centers

and center loss can also be used to improve performance. However, to save storage space

and to compare with other binary networks, we use two clustering centers for MCNs in all

the following experiments.

Our binarized networks can save storage space by 32 in convolutional layers compared

with the corresponding full-precision networks, where 4 bytes (32 bits) represent a real

value. Since MCNs only contain one fully connected layer that is not binarized, the storage

of the whole network is signiﬁcantly reduced.

The architecture parameter K: The number of planes for each M-Filter, i.e., K, is also

evaluated. As revealed by the results in Fig. 3.5, more planes in each M-ﬁlter involved in

reconstructing the unbinarized ﬁlters yield better performance. For example, when increas-

ing K from 4 to 8, the performance is improved by 1.02%. For simplicity, we choose K = 4

in the following experiments.

The width of MCNs:

CIFAR-10 is used to evaluate the eﬀect of the width of Wide-

ResNets with MCNs. The accuracy and number of parameters are compared with a recent

binary CNN,

LBCNN. The basic width of the stage (the number of convolution kernels

per layer) is set to 16 −16 −32 −64. To compare with LBCNN, we set up 20-layer MCNs

with basic block-c (in Fig. 3.9), whose depth is the same as in LBCNN. We also use other

network widths to evaluate the eﬀect of width on MCNs.

The results are shown in Table 3.1. The second column refers to the width of each layer

of the MCNs, and a similar notation is also used in [281]. In the third column, we give the

parameter amounts of MCNs and the 20-layer LBCNN with the best result. The fourth

column shows the accuracy of baselines whose networks are trained based on the Wide-

ResNets (WRNs) structure with the same depth and width as the MCNs. The last two